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i This application is submitted in the name of the following inventor(s): 

2 

3 Inventor Citizenship Residence City and State 

4 David L. Isaman United States San Diego, California 

5 

6 The assignee is MetaFlow Technology, Inc., having an office at 4250 Ex- 

7 ecutive Square, Suite 300, La Jolla, CA 92037. 
8 

? »;9 Title of Invention 

5s ss 

*fh l Symbolic Store-Load Bypass 

B 2 

i *1 3 Related Applications 

^15 This application claims priority to copending provisional application num- 

16 ber 06/1 14,295 entitled "Symbolic Store-Load Bypass", filed December 31, 1998, by the 
! 7 same inventor. 
18 

1 9 The inventions described herein can be used in combination or conjunction 

20 with inventions described in the following patent applications (2): 
21 
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1 « Application Serial No. 60/1 14296, Express Mail Mailing No. EE506030698US, filed 

2 December 31, 1998, in the name of Anatoly Gelman, titled "Call Return Branch Pro- 

3 duction Buffer/ 9 assigned to the same assignee, attorney docket number META-013, 

4 and all pending cases claiming priority thereof; and 
5 

6 • Application Serial No. 06/14,297, Express Mail Mailing No. EE506030684US, filed 

7 December 31, 1998, in the name of Anatoly Gelman and Russell Schapp, titled 

8 "Block-Based Branch Table Buffer," assigned to the same assignee, attorney docket 

0 9 number META-014, and all pending cases claiming priority thereof 

-Jo 

1 J l These applications are hereby incorporated by reference as if fully set forth 
ill 2 herein. These applications are collective referred to herein as "incorporated disclosures". 

la 14 

yi5 Background of the Invention 

16 

17 A Field of the Invention 
18 

1 9 This invention relates to microprocessor design. 
20 

21 
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1 2. Related Art 
2 

3 In microprocessors employing pipelined architecture, it is desirable to be in 

4 the process of executing as many instructions as possible, so that each element of the 

5 pipeline is maintained busy. However, some instructions, such as instructions that load 

6 data from external memory or stage data into external memory, must generally be exe- 

7 cuted in their original sequence order, so as to avoid the external memory ever being in 

8 an incorrect state. Moreover, when such instructions refer to identical external memory 

9 locations, where is no particular need to wait for the actual external memory operations to 
k i o complete, as the identical data is already available for the processor to operate with. 

hii 

"b, :ii 

W\2 One problem in the known art is that determining whether two different in- 

f .1 3 structions refer to the identical location in external memory generally requires computing 

Ul 4 the actual external memory address referenced by each of the two different instructions. 

Q5 This prolongs when the determination can be made, because it requires time (and typi- 

16 cally, a pipeline stage) to actually compute the referenced external memory addresses. 

17 

18 Accordingly, it would be advantageous to provide a technique for operating 

1 9 a pipelined microprocessor more quickly, by detecting instructions that load from identi- 

20 cal memory locations as were recently stored to, without having to actually compute the 

21 referenced external memory addresses. In a preferred embodiment, the microprocessor 

22 examines the symbolic structure of instructions as they are encountered, so as to be able 
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1 to detect identical memory locations by examination of their symbolic structure. For ex- 

2 ample, instructions that store to and load from an identical offset from an identical regis- 

3 ter are determined to be referencing the identical memory locations, without having to 

4 actually compute the complete physical target address. 

5 

6 Summary of the Invention 

7 

8 The invention provides a method and system for operating a pipelined mi- 

9 croprocessor more quickly, by detecting instructions that load from identical memory lo- 
% io cations as were recently stored to, without having to actually compute the referenced ex- 
u|l ternal memory addresses. The microprocessor examines the symbolic structure of in- 
£|2 structions as they are encountered, so as to be able to detect identical memory locations 
[]3 by examination of their symbolic structure. For example, in a preferred embodiment, fo- 
il 4 structions that store to and load from an identical offset from an identical register are de- 
*l5 termined to be referencing the identical memory location, without having to actually 

1 6 compute the complete physical target address. 
17 

18 /// 

19 /// 

20 /// 
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l Brief Description of the Drawings 

2 

3 Figure 1 shows a block diagram of a system in a pipelined microprocessor 

4 for detecting identical locations referenced by different load and store instructions. 
5 

6 Figure 2 shows a process flow diagram of a method for operating a system 

7 in a pipelined microprocessor for detecting identical locations referenced by different 

8 load and store instructions. 

_9 

''"lo Detailed Description of the Preferred Embodiment 

l:Jl 

£ [ ] 2 In the following description, a preferred embodiment of the invention is de- 

f j 3 scribed with regard to preferred process steps and data structures. Embodiments of the 

M 4 invention can be implemented using circuits in a microprocessor or other device, adapted 

M5 to particular process steps and data structures described herein. Implementation of the 

1 6 process steps and data structures described herein would not require undue experimenta- 

1 7 tion or further invention. 

18 

19 /// 

20 /// 

21 /// 

22 /// 
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1 System Elements 
2 

3 Figure 1 shows a block diagram in a pipelined microprocessor for detecting 

4 identical locations referenced by different load and store instructions. 

5 

6 A microprocessor 100 includes a sequence of pipeline stages, including an 

7 instruction fetch state 1 10, an instruction decode state 120, an address computation state 

8 130 and an instruction execution state 140. In a preferred embodiment, the pipeline 

9 stages of the microprocessor 100 operate concurrently on sequences of instructions 151 in 
Ho a pipelined manner. Pipeline operation is known in the art of microprocessor design. 

lh 

•a -:;r 
s 

C82 In operation, the microprocessor 100 is coupled to an instruction memory 

: 1 3 150 which includes a plurality of instructions 1 5 1 , at least some of which are memory 

j = l4 load or store instructions. In a preferred embodiment, the instruction memory 150 in- 

*J5 eludes a random access memory. Memory caching operations can be performed either by 

16 the instruction memory 150, input and output elements of the microprocessor 100, or 

1 7 both. Memory caching operations, as well as other aspects of reading and writing mem- 

1 8 ory locations, are known in the art of computer memories and so are not further described 

19 herein. 

20 

21 The microprocessor 100 reads a sequence of instructions 151 from the in- 

22 struction memory 150 using the instruction fetch stage 110 (and including any associated 
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1 memory read or write elements in the microprocessor 100). In a preferred embodiment, 

2 the input instruction buffer 110 includes a plurality of instructions 151 from the instruc- 

3 tion memory 150, but there is no particular requirement therefore. 
4 

5 The instruction fetch stage 110 couples the instructions to the instruction 

6 decode state 120. 

7 

8 The instruction decode stage 120 parses the instructions 151 to determine 

9 what types of instructions 151 they are (such as instructions 151 that load data from ex- 
do ternal memory or store data to external memory). As part of the parsing instructions 15 1, 
=J1 and in addition to determine what operations the instructions 151 command the micro- 
ti 2 processor 100 to perform, the instruction decode stage 120 determines the syntax of any 
1 3 addresses in the external memory that the instructions 151 refer to as operands. 

i 4 

35 For example, an instruction that loads data from external memory has a 

16 format that refers to the specific location in external memory from which to load the data. 

1 7 The format can include a base address value and an offset address value, which are to be 

1 8 added to compute the effective reference address of the instruction 151. The base address 

1 9 value can be a constant value or specify a value found in an internal register of the micro- 

20 processor 100. Similarly, the offset address value can be a constant value or specify a 

21 value found in an internal register of the microprocessor. 
22 
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1 Similarly, an instruction that stores data to external memory has a format 

2 that refers to the specific location in external memory from which to store the data. The 

3 format can similarly include a base address value and an offset address value, which are 

4 used to compute the effective reference address of the instruction 151. 

5 

6 The instruction decode stage 120 couples the parts of the instruction 151, 

7 including information about the base address value and the offset address value, to the 

8 address computation stage 130. 

% to The address computation stage 130 receives the base address value and the 

Si :12 

bii offset address value, and computes the effective reference address of the instruction 151. 

!! 2 

I p The instruction decode stage 120 couples the parts of the instruction 151, 



144 including information about what operations the instructions 151 command the micro- 
Ms processor 100 to perform, and what the syntax of any addresses the instructions 151 refer 

16 to as operands, to the instruction execution stage 140. The address computation stage 130 

17 couples the effective reference address of the instruction 151, to the instruction execution 

18 stage 140. 

19 

20 The instruction decode stage 120 includes a symbolic load-store bypass 

21 element 121. The bypass element 121 examines the parts of the instruction 151, includ- 

22 ing information about what operations the instructions 151 command the microprocessor 

8 
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1 100 to perform. If these operations are to load data from external memory, or to store 

2 data to external memory, the bypass element 121 further examines the syntax of any ad- 

3 dresses 151 refer to as operands. 

4 

5 If the operand addresses the instructions 151 refer to include identical base 

6 address values and offset address values, the bypass element 121 generates a bypass sig- 

7 nal indicating that the instructions 151 refer to the same location in external memory. 
8 

n 9 When the bypass signal is generating, the address computation stage 130, 

3o does not have to compute the actual effective address for the microprocessor 100 to act on 

la 1 the knowledge that the instructions 151 refer to identical locations in external memory. 

f? 2 

U13 For example, suppose that a first instruction 151 to store data refers to a lo- 

N 4 cation in external memory determined as (contents of register A) + (fixed offset value B), 

~15 and a second instruction 151 to load data refers to the same location in external memory 

16 determined as (contents of register A) + (fixed offset value B), where A and B are identi- 

17 cal. In this case, the microprocessor 100 can proceed with the knowledge that the first 

18 (store) instruction 151 and the second (load instruction) 151 refer to the same location. 

19 Since the second (load) instruction 151 is going to read the same data from external 

20 memory that the first (store) instruction 151 put there, the microprocessor 100 can pro- 

21 ceed by using that data from an internal register, rather than waiting for external memory 

22 to complete actual store and load operations. 
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1 Although the actual first (store) instruction 151 would be physically per- 

2 formed and completed by external memory, the microprocessor 100 can proceed without 

3 physically performing the second (load) instruction 151. Instead, the microprocessor 100 

4 can use the identical data from it's internal register, thus removing a relative delay in mi- 

5 coprocessor 100 operation. 
6 

7 Method of Operation 
8 

ri 9 Figure 2 shows a process flow diagram of a method for operating a system 

Jo in a pipelined microprocessor for detecting identical locations referenced by different 

Yj 1 load and store instructions. 

I 2 

mi 3 A method 200 is performed by the microprocessor 100, including its se- 

44 quence of pipeline stages. In a preferred embodiment, as many steps of the method 200 

l'i 5 are performed concurrently in a pipelined manner. Pipeline operation is known in the art 

1 6 of microprocessor design. 

17 

18 At a flow point 210, microprocessor 100 is coupled to an instruction mem- 

19 ory 150, which includes a plurality of instructions 151, and is ready to perform those in- 

20 structions 151. At least some of those instructions 151 are memory load or store instruc- 

21 tions. 

22 
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1 At a flow point 211, the microprocessor reads a sequence of instructions 

2 151 from the memory 150 using the instruction fetch stage 1 10 (and including any associ- 

3 ated memory read or write elements in the microprocessor 100). 
4 

5 At a step 212, the instruction fetch stage 110 couples the instructions 151 to 

6 the instruction decode stage 120. 

7 

8 At a step 213(a), the instruction decode stage 120 parses the instructions 

y9 151 to determine whether they are instructions 151 that load data from external memory 

= J0 or store data to external memory. 

sr. :« 

u 11 

pi 2 At a step 213(b), the instruction decode stage 120 determines the syntax of 

s 

f : i 3 any addresses in the external memory that the instructions 151 refer to as operands. 

a, .ii, 

f;i4 

ij 15 At a step 214, the bypass element 121 examines the parts of the instruction 



16 151, including information about what operations the instructions 151 command the mi- 

17 croprocessor 100 to perform. If these operations are to load data from external memory, 

18 or to store data to external memory, the method continues with the step 215. If these op- 

19 erations are otherwise, the method continues with the step 221 . 

20 

21 In a step 215, a record of the symbolic operands of the store operations to 

22 external memory is stored in a table that is indexed by the instruction ID. 



11 



130.1012.02 

1 In a step 216, each load instruction's operands are compared against both 

2 the store instructions being issued in the ongoing clock cycle and those of all unretired 

3 store instructions. By storing the record of these operations for comparison, there is a 

4 much higher probability of detecting a useful bypass in subsequent steps where the bypass 

5 element 121 further examines the syntax of any addresses the instructions 151 refer to as 

6 operands. 

7 

8 At a step 217, the bypass element 121 determines whether the operand ad- 

p9 dresses that the instructions 151 refer to include identical base address values and offset 

Ao address values. If so, the bypass element 121 generates a bypass signal indicating that the 

l instructions 151 refer to the same location in external memory. If not, the bypass element 

pf 2 121 does not generate a bypass signal. (In alternative embodiments, the bypass element 

Hi 3 121 may generate an inverse bypass signal). If the bypass element 121 generates a bypass 

M 

f:?i4 signal, the method 200 proceeds with the step 216. If not, the method 200 proceeds with 

3 5 the step 221. 

16 

17 At a flow point 220, the bypass signal having been generated, the micro- 

18 processor 100 can act on the knowledge that the instructions 151 refer to identical loca- 

19 tions in external memory. For example, if a first (store) instruction 151 and a second 

20 (load) instruction 151 refer to identical locations in external memory, the microprocessor 

21 100 can proceed by using data to be transferred by those instructions 151 from an internal 
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1 register. The microprocessor 100 does not have to wait for external memory to complete 

2 actual store and load operations. 
3 



4 At a step 221, the instruction decode stage 120 couples the parts of the in- 

5 struction 15 1, including information about the base address value and the offset address 

6 value to the address computation stage 130. 

7 

8 At a step 222, the address computation stage 130 receives the base address 

f i 9 value and the offset address value, and computes the effective reference address of the 

,io instruction 151. 

J 4 2 At a step 223, the instruction decode stage 120 couples the parts of the in- 



M3 struction 151, including information about what operations the instructions 151 command 

Hi 4 the microprocessor 100 to perform, and what the syntax of any address the instructions 

yA 5 151 refer to as operands, to the instruction execution stage 140. 
16 



17 At a step 224, the address computation stage 130 couples the effective ref- 

1 8 erence address of the instruction 1 5 1 , to the instruction execution stage 1 40. 

19 

20 At a step 225, the first (store) instruction 151 is physically performed and 

2 1 completed by external memory. 
22 



13 



130.1012.02 

1 At a step 226(a), if the bypass signal was generated, the microprocessor 100 

2 proceeds without physically performing the second (load) instruction 151. Instead, the 

3 microprocessor 100 can use the identical data from it's internal register, thus removing a 

4 relative delay in microprocessor 100 operation. 
5 

6 Alternatively, at a step 226(b), if the bypass signal was not generated, or in 

7 if an inverse bypass signal was generated, second (load) instruction 1 5 1 is physically per- 

8 formed and completed by external memory. 
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1 Alternative Embodiment 
2 

3 Although preferred embodiments are disclosed herein, many variations are 

4 possible which remain within the concept, scope and spirit of the invention, and these 

5 variations would become clear to those skilled in the art after perusal of this application. 
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l Claims 
2 

3 LA method for operating a pipelined microprocessor, said method in- 

4 eluding steps for 

5 detecting a first instruction that stores to a first memory location, said first 

6 instruction including syntax for computing an effective address for said first memory lo- 

7 cation; 

8 detecting a second instruction that stores to a second memory location, said 
C3 9 second instruction including syntax for computing an effective address for said second 
"30 memory location; 

ta y 

ill l determining, in response to said syntax for said first instruction and said 

C3i2 syntax for said second instruction, a relationship between said first memory location and 

f 7 1 3 said second memory location, without computing said effective address for both said first 

I j 14 memory location and said second memory location; and 

u 1 5 determining whether to perform one of said first instruction and second in- 

1 6 struction in response to said step of determining a relationship. 
17 
18 
19 
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l Abstract 
2 

3 The invention provides a method and system for operating a pipelined mi- 

4 coprocessor more quickly, by detecting instructions that load from identical memory lo- 

5 cations as were recently stored to, without having to actually compute the referenced ex- 

6 ternal memory addresses. The microprocessor examines the symbolic structure of in- 

7 structions as they are encountered, so as to be able to detect identical memory locations 

8 by examination of their symbolic structure. For example, in a preferred embodiment, in- 

9 structions that store to and load from an identical offset from an identical register are de- 

10 termined to be referencing the identical memory location, without having to actually 

1 1 compute the complete physical target address. 
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210 

Microprocessor is coupled to an 
instruction memory 150 which includes 
a plurality of instructions 151, and is 
ready to perform those instructions 1 5 L 



211 

The microprocessor reads a sequence of 
instructions 151 from the memory 150 
using instruction fetch stage 110. 



212 

Instruction fetch stage 1 10 couples the 
instructions 151 to the instruction 
decode stage 120. 



213(a) 

The instruction decode stage 120 parses 
the instructions 151 to determine 
whether they are instructions to load data 
to an external memory or store data from 
an external memory. 



213(b) 

The instruction decode stages 120 
determines the syntax of any addresses 
in the external memory that the 
instructions 151 refer to as operands. 



214 

The bypass element 121 examines parts 
of the instruction 151, including 
information about what operations the 
instructions 151 command the 
microprocessor 100 to perform. If these 
operations are to load or store data, the 
method continues with step 115. If these 
operations are otherwise, the method 
continues with step 221 . 
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Fig. 2A 



215 

A record of the symbolic operands of the 
store operations to external memory is 
stored in a table that is indexed by the 
instruction ID. 



216 

Each load instructions' operands are 
compared against the store instructions 
being issued in the ongoing clock cycle 
and those of all unretired store 
instructions. 



217 

The bypass element 121 determines 
whether the operand addresses that the 
instructions 151 refer to include identical 
base address values. If so, the bypass 
element generates a bypass signal. If 
not, the bypass element does not 
generate a bypass signal. 



220 

The microprocessor can act on the 
knowledge that the instructions 151 refer 
to identical locations in an external 
memory. 



221 

The instruction decode stage 120 
couples the parts of the instruction 151, 
including information about the base 
address value and the offset address 
value to the address computation stage. 
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FROM FIG. 2A 



222 

The address computation stage receives 
the base address value and the offset 
address value and computes the effective 
reference address of the instruction 151. 



223 

The instruction decode stage 120 
couples the parts of the instruction 151 
to the instruction execution stage 140. 



224 

The address computation stage 130 
couples the effective reference address 
of the instruction 151 to the instruction 
execution stage 140. 



225 

The first (store) instruction is physically 
performed and completed by external 
memory. 



226(a) 

If the bypass signal was generated, the 
microprocessor proceeds without 
performing the second load instruction 
151. If the bypass signal was not 
generated then the method proceeds at 
step 226(b.) 




226(b) 

If the bypass signal was not generated 
(or if an inverse bypass signal was 
generated), the second load instruction 
151 is physically performed and 
completed by external memory. 



